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Abstract 

This paper analyzes the relative advantages between crossover and mutation on a class of 
deterministic and stochastic additively separable problems. This study assumes that the recom- 
bination and mutation operators have the knowledge of the building blocks (BBs) and effectively 
exchange or search among competing BBs. Facetwise models of convergence time and popula- 
tion sizing have been used to determine the scalability of each algorithm. The analysis shows 
that for additively separable deterministic problems, the BB-wise mutation is more efficient 
than crossover, while the crossover outperforms the mutation on additively separable problems 
perturbed with additive Gaussian noise. The results show that the speed-up of using BB-wise 
mutation on deterministic problems is 0{-\fk\ogm), where fc is the BB size, and m is the num- 
ber of BBs. Likewise, the speed-up of using crossover on stochastic problems with fixed noise 
variance is 0(mVk/ logm). 



1 Introduction 

Great debate between crossover and mutation has consumed much ink and many trees over the 
years. When mutation works it is lightening quick and uses small or non-extent populations. 
Crossover when it works, seems to be able to tackle more complex problems, but getting the 
population size and other parameters set is a challenge. Comparisons between the two are usually 
written by a researcher with an axe to grind. Comparisons are usually empirical, the basis for 
comparison is implicitly or explicitly unfair, and theory is non-existent. Wouldn't it be nice to 
compare our two favorite genetic operators on a fair basis in an interesting class of problems and 
let them slug it out head to head. 

That's what we do here. Assuming that both the recombination and mutation operators possess 
linkage (or neighborhood) knowledge, we pit them against each other for solving boundedly difficult 
additively separable problems with and without the presence of additive exogenous noise. We 
use a recombination operator that exchanges building blocks (BBs) without disrupting them and 
a mutation operator that performs local search among competing building-block neighborhood. 
The motivation for this study also comes from recent local-search literature, where authors have 
highlighted the importance of using a good neighborhood operator ( Barnes, Dimova, &: Dokov, 2 003 
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Watson, 2003). However, a systematic method of designing a good neighborhood operator for a 
class of search problems is still an open question. We investigate whether using a neighborhood 
operator that searches among competing BBs of a problem would be advantageous and if so under 
what circumstances. 

This paper is organized as follows. The next section gives a brief review of related literature. 
We provide an outline of the crossover-based and mutation-based genetic algorithms (GAs) in 
Section |21 Facetwise models are developed to determine the scalability of the crossover and the 
BB-wise mutation-based GAs for deterministic fitness functions in Section 2] and for noisy fitness 
functions in Section |SJ Finally, we discuss future research directions followed by conclusions. 



2 Literature Review 



Over the last few decades many researchers have empirically and theoretically studied where genetic 
algorithms excel. An exhaustive literature review is out of the scope of this paper, and therefore 
we present a brief review of related theoretical studies. 

Several authors have analyzed the scalability of a mutation based hillclimber and compared it to 
scalability of different forms of genetic algorithms, such as breeder genetic algorithm ( Miihlenbein, 199l| 



Miihlenbein, 1992), an ideal genetic algorithm (Mitchell, Hol land, fc Forrest, 1994] ), and a genetic 
algorithm with culling ( Baum, Boneh, Garrett, 200ip . Goldberg ( |Goldberg, 1999D gave a the- 
oretical analysis of deciding between a single run with a large population GA and multiple runs 
with several small population GAs, under the constraint of fixed computational cost. He showed 
that for uniformly-scaled problems a single run of large population GA was advantageous, while for 
exponentially-scaled problems small population GAs with multiple restarts were better. Srivastava 
and Goldberg ( Srivastava Hi Goldberg, 2001} |Srivastava, 2002"] ) empirically verified and analytically 
enhanced the time- continuation theory put forth by Goldberg (Goldberg , 1999 ). Recently, Cantu- 
Paz and Goldberg ( Cantu-Paz Goldberg, 20 03) investigated scenarios under which multiple runs 
of a GA are better than a single GA run. For an exhaustive review of studies on the advan- 
tages/disadvantages of multiple populations both under serial and parallel GAs over a single large- 
population GA, the reader is referred elsewhere ( Oantu-Paz, 2000"] Srivastava, 2002| Luke, 2 001 
Fuchs, 1999) and to the references therein. 



While many of the related studies flGoldberg, 19991 |Srivastava & Goldberg, 2001||Cantu-Paz fc Goldbergr2 003 
assumed fixed genetic operators, with no knowledge of building-block structure, in this paper, we 
assume that the recombination and mutation operators have linkage (or neighborhood) knowledge. 
While the linkage information is usually unknown for a given search problem, a variety of linkage 



identification methods can be used to design the operators (see Goldberg (Goldberg, 2002), Sastry 
and Goldberg (Sastry & Goldberg, 2004), and references therein). 



3 Preliminaries 

The objective of this paper is to predict the relative computational costs of a crossover and an ideal- 
mutation based algorithm for additively separable problems with and without additive Gaussian 
noise. Before developing models for estimating the computational costs, we briefly describe the 
algorithms and the assumptions used in the paper. 
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3.1 Selectorecombinative Genetic Algorithms 

We consider a generationwise selectorecombinative GA with non-overlapping populations of fixed 
size ( Holland, 1975} Goldberg, 1989 ). We apply crossover with a probability of 1.0 and do not use 



any mutation. We assume binary strings of fixed length as the chromosomes. To ease the analytical 
burden, the selection mechanism assumed throughout the analysis is binary tournament selection 
(Goldberg, Korb, &; Deb, 1989). However, the results can be extended to other tournament sizes 



and other selection methods in a straightforward manner. The recombination method used in the 
analysis is a uniform building-block- wise crossover (Thicrcns & Goldberg, 1994). In uniform BB- 



wise crossover, two parents are randomly selected from the mating pool and their building blocks 
in each partition are exchanged with a probability of 0.5. Therefore, none of the building blocks are 
disrupted during a recombination event. The offspring created through crossover entirely replace 
the parental individuals. 

3.2 Building-Block- Wise Mutation Algorithm (BBMA) 

In this paper we consider an enumerative BB-wise mutation operator, in which we start with a 
random individual and evaluate all possible schemas in a given partition. That is, for a building- 
block of size k, we evaluate all 2 k individuals. The best out of 2 k individuals is chosen as a candidate 
for mutating BBs of other partitions. In other words, the BBs in different partitions are mutated 
in a sequential manner. For a problem with m BBs of size k each, the BBMA can be described as 
follows: 

1. Start with a random individual and evaluate it. 

2. Consider the first non- mutated BB. Here the BB order is chosen arbitrarily from left-to-right, 
however, different schemes can be — or may required to be — chosen to decide the order of BBs. 

3. Create 2 k - 1 unique individuals with all possible schemata in the chosen BB partition. Note 
that the schemata in other partitions are the same as the original individual (from step 2). 

4. Evaluate all 2 k — 1 individuals and retain the best for mutation of BBs in other partitions. 

5. Repeat steps 2-4 till BBs of all the partitions have been mutated. 

We use an enumerative BB-wise mutation for simplifying the analysis and a greedy BB-wise method 
can improve the performance of the mutation-based algorithm. A straightforward Markov process 
analysis — along the lines of ( Miihlenbein, 199T| Miihlenbein, 1992 1 — of a greedy BB-wise mutation 



algorithm indeed shows that the greedy method is on an average better than the enumerative 
one. However, the analysis also shows that differences between the greedy and enumerative BB- 
wise mutation approaches are little, especially for moderate-to-large problems. Moreover, the 
computational costs of an enumerative BB-wise mutation bounds the costs of a greedy BB-wise 
mutation. 

4 Crossover vs. Mutation: Deterministic Fitness Functions 

In this section we analyze the relative computational costs of using a selectorecombinative GA or a 
BB-wise mutation algorithm for successfully solving deterministic problems of bounded difficulty. 
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The objective of the analysis is to answer whether a population-based selectorecombinative GA is 
computationally advantageous over a BB-wise-mutation based algorithm. If one algorithm is better 
than the other, we are also interested in estimating the savings in computational time. Note that 
unlike earlier studies, we assume that the building-block structure is known to both the crossover 
and mutation operators. 

We begin our analysis with the scalability of selectorecombinative genetic algorithms followed 
by the scalability of the BB-wise mutation algorithm. 



4.1 Scalability of Selectorecombinative GA 

Two key factors for predicting the scalability and estimating the computational costs of a genetic 
algorithm are the convergence time and population sizing. Therefore, in the following subsections 
we present facetwise models of convergence time and population sizing. 



4.1.1 Population-Sizing Model 



Goldberg, Deb, & Clark ( Goldberg, Deb, &: Clark, 1992 ) proposed population-sizing models for 
correctly deciding between competing BBs. They incorporated noise arising from other par- 
titions into their model. However, they assumed that if wrong BBs were chosen in the first 
generation, the GAs would be unable to recover from the error. Harik, Cantu-Paz, Goldberg, 
and Miller (Harik, Cantu-Paz, Goldberg, & Miller, 1999) refined the above model by incorporat- 
ing cumulative effects of decision making over time rather than in first generation only. Harik et 
al. (Harik, Cantu-Paz, Goldberg, & Miller, 1999) modeled the decision making between compet- 
ing BBs as a gambler's ruin problem. Here we use an approximate form of the gambler's ruin 
population-sizing model (Harik, Cantu-Paz, Goldberg, & Miller, 1999): 



n 



2 d 



m log m, 



(1) 



where k is the BB size, m is the number of BBs, d is the size signal between the competing BBs, 
and &bb is the fitness variance of a building block, building blocks. The above equation assumes 
a failure probability, a = 1/m. 



4.1.2 Convergence-Time Model 



Miihlenbein and Schlierkamp-Voosen (Miihlenbein &; Schlierkamp-Voosen, 1993) derived a convergence- 



time model for the breeder GA using the notion of selection intensity pjulmer, 1985 1 from pop- 
ulation genetics. Thierens and Goldberg (Thierens & Goldberg, 1994) derived convergence-time 
models for different selections schemes including binary tournament selection. Back (Back, 1994) 
derived estimates of selection intensity for s-wise tournament and (fj,, A) selection. Miller and Gold- 
berg ( Miller G oldber g, 1995 ) developed convergence-time models for s-wise tournament selec- 
tion and incorporated the effects of external noise. Back (Back, 1995) developed convergence-time 
models for (/i, A) selection. Even though the selection-intensity-based convergence-time models 
were developed for the OneMax problem, Miller and Goldberg (Miller, 1997) observed that they 
are generally applicable to additively decomposable problems of bounded order. Here, we use the 
convergence-time model of Miller and Goldberg fMiller Goldberg, 1995 ) : 



(2) 
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where I is the selection intensity, and I = mk is the string length. For binary tournament selection, 

Using equations n and [2 we can now predict the scalability, or the number of function evalua- 
tions required for successful convergence, of GAs as follows: 

n {eGA = n-t c = ^—^^-Vklogm-2 k -m. (3) 
4 a 

4.2 Scalability of BB-wise Mutation Algorithm 

Since the initial point is evaluated once and after that for each of the m BBs, 2 k — 1 individuals 
are evaluated, the total number of function evaluations required for the BBMA is 

™fe,BBMA = (2 fc - l) m + 1. (4) 

The results from the above subsections (Equations |3] and 0} indicate that while the scalability 
of a selectorecombinative GA is O (2 k m log rnj , the scalability of the BBMA is O (2 k mj . This is in 

contrast to a random-walk mutation algorithm with no BB knowledge which scales as O (m k log 



m 



(Muhlcnbein, 1992 1 . By searching among building-block neighborhoods, the selectomutative algo- 



rithm scales-up significantly better than a mutation operator with no linkage information and 
provides a savings of O(vfclogm) evaluations over the GA. The savings comes from the extra 
evaluation required for the convergence and decision-making in the selectorecombinative GAs. 

The speed-up — which is defined as the ratio of number of function evaluations required by a 
GA to that required by BBMA — obtained by using a BB-wise mutation algorithm over a selectore- 
combinative GA is given by 

T) = — fc ' GA = O (Vk log m) . (5) 

raf e ,BBMA V ' 

In particular, the speed-up for the OneMax problem {k = 1) is given by 



-mlogm 7r 2 



f?OneMax = ~ ~, ~ '^T logm, (6) 

m + 1 4 



and for the GA-hard m k-Trap function ( |Goldberg, 1987 ) , the speed-up is given by 



= ( 2 * _ i )m + i ^ T — VHog- (7) 



The speed-up predicted by Equations |H1 and [7|are verified with empirical results in Figures 



la 



and l(b)| respectively. The results are averaged over 900 independent runs. The results show that 



there is a good agreement between the predicted and observed speed-up. The results show that 
for deterministic additively separable problems, a BB-wise mutation algorithm is about 0{ykm) 
times faster than a selectorecombinative genetic algorithm. 



5 Crossover vs. Mutation: Noisy Fitness Functions 

In the previous section, we observed that BB-wise mutation scales-up better than a crossover 
on deterministic additively separable problems. Furthermore, a selectomutative algorithm was 
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Figure 1: Empirical verification of the speed-up predicted for using BB-wise mutation over a selec- 
torecombinative GA by Equations El and [7| The empirical results are averaged over 900 independent 
runs. The results show that the speed-up obtained by BB-wise mutation algorithm over a GA is 
O(Vklogm). 
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able to overcome deception, one of the key factors influencing problem difficulty, using linkage 
(neighborhood) information and enumeration within the neighborhood. In this section we introduce 
another dimension of problem difficulty in extra-BB noise (Goldbcrg ,~2002l ) and analyze if the BB- 
wise mutation maintains its edge over crossover. That is, we analyze whether a selectorecombinative 
or a selectomutative GA works better on additively separable problems with additive external 
Gaussian noise. 

We follow the same approach outlined in the previous section and consider the scalability of 
crossover and mutation. 



5.1 Scalability of Selectorecombinative GAs 

Again we use the convergence-time and population-sizing models to determine the scalability of 
GAs under the presence of unbiased Gaussian noise. We use an approximate form of the gambler's 
ruin population-sizing model for noisy environments: 



n = — — 2 ym log m 



\ 




(8) 



where crfj is the variance of the noise, and a'j is the fitness variance. 



We use an approximate form of Miller and Goldberg's (Miller &, Goldberg, 1995 ) convergence- 
time model: 



tr 



TV 

21 



m 



1 + 



a 



N 



a 



(9) 



A detailed derivation of the above equation and other approximations are given elsewhere ( [Goldberg, 20021 
Sastry, 2001 ). 

The population-sizing and convergence-time models indicate that the exogenous noise increases 
the population size and elongates the convergence time. Using equations ^ and |2 we can now 
predict the scalability, or the number of function evaluations required for successful convergence, 
of GAs as follows: 



"fe,GA 



TV OBB 



vk loe 



m ■ 



1 + 



a 



N 



2 k -m. 



a 



(10) 



5.2 Scalability of BB-wise Mutation Algorithm 

Unlike the deterministic case where a BB was perturbed and evaluated once, in the presence of 
exogenous noise we cannot rely on only a single evaluation. In other words, in the presence of noise, 
an average of multiple samples of the fitness should be used in deciding between competing building 
blocks. Now the question remains as to exactly how many samples have to be considered. This 
issue of exact samples of fitness required to correctly decide between competing building blocks in 
the presence of noise has been addressed elsewhere ( Goldberg, Deb, &: Clark,~ 1992): 



2ca 2 N , 



(11) 



where n s is the number of independent fitness samples, and c is the square of the ordinate of a one- 
sided standard Gaussian deviate at a specified error probability a. For low error values, c can be 
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Figure 2: Comparison of the number of samples of fitness evaluations per individual required to 
correctly decide between competing building blocks as predicted by Equation ^2 with empirical 
results. 



obtained by the usual approximation for the tail of a Gaussian distribution: a ~ exp(— c/2)/(y/2c). 
In this paper we have used a = 1/m. Equation ^2 is empirically verified for the Noisy-OneMax 
problem in Figure [21 The results show a good agreement between the model and experiments. 

Since the initial point is evaluated n s times and after that for each of the m BBs, 2 k — 1 
individuals are evaluated n s times, the total number of function evaluations required for the BBMA 
for noisy fitness functions is given by 



™fe,BBMA — n s 



(2 k - l) m + l] , 
2c^f -maBB^j [(2 fc - l) to + l] . (12) 
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The results from the above subsections ( Equations ITU1 and IT2*)) indicate that under the presence 
of exogenous noise, a selectorecombinative GA scales as O (2 k m\ogm{l + af^/ajfj. On the other 

hand, the BB-wise mutation scales as O (^2 k m 2 (a'^f /ajfj . Therefore, for constant values of ajj/a'j, a 

selectorecombinative GA is 0(^/km/ log to) times faster than the BB-wise mutation. By implicitly 
averaging out the exogenous noise, crossover is able to overcome the extra effort needed for the 
convergence and decision-making. On the other hand the explicit averaging via multiple fitness 
samples by the BB-wise mutation leads to an order of magnitude increase in the number of function 
evaluations. 

The speed-up — which is defined as the ratio of number of function evaluations required by mu- 
tation to that required by crossover — obtained by using a selectorecombinative over selectomutative 
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Figure 3: Empirical verification of the speed-up predicted for using BB-wise mutation over a selec- 
torecombinative GA by Equation El f° r the OneMax problem with exogenous noise. The empirical 
results are averaged over 900 independent runs. The results show that a selectorecombinative GA 
uses significantly less number of function evaluations than the BB-wise mutation algorithm. 



GA is given by 



n fc,BBMA 
"fe.GA 



o 



\/k 



in 



logm 



1 + 



(13) 



In particular, the speed-up for the OneMax problem {k = 1) is given by 

/ 



^Noisy OneMax 



4c m 



7T 



log m 



4 



1 + 



4 



(14) 



The speed-up predicted by Equation ^] is verified with empirical results in Figure El The results 
are averaged over 900 independent runs. The results show that there is a good agreement between 
the predicted and observed speed-up. The results show that for stochastic additively separable 
problems with constant noise variance, a selectorecombinative GA is about 0(\fkm/ 'logm) times 
faster than the BB-wise mutation algorithm. 



6 Future Work 

The results of this paper indicate that there are significant advantages of using a mutation operator 
that performs hillclimbing in the BB space and indicates many avenues of future research some of 
which are listed in the following: 
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Hybridization of crossover and BB-wise mutation: While this paper consider a bounding 
case of crossover vs. mutation, it might be (more likely it is) more effective to use an efficient 
hybrid of crossover and mutation. 

Designing BB-wise Mutation: In this paper we assumed that the BB information was known, 
which generally is not the case. Over the last few years, effective recombination operators 
that adapt linkage have been developed in a systematic manner (Goldberg, 2002 1 . On the 



other hand, most mutation operators, including adaptive ones, search in the local neigh- 
borhood of a solution. Furthermore, there has been growing evidence of the importance of 
using good neighborhood operators in determining the effectiveness of local-search methods 
( Barnes, Dimova, fe Dokov, 2003| Watson, 2003 ) . Despite the importance of having good 



neighborhood information, a general methodology for designing operators with good neigh- 
borhood information is non-existent. That is, little attention has been paid to systematically 
design effective mutation operators that performs local search in the building-block space 



( Sastry Goldberg, 2004). The results of this paper indicate that the dividends obtained by 
designing BB-wise mutation operators that adaptively identify and utilize good neighborhood 
information can be significant. 

Problems with overlapping building blocks: While this paper considered problems with non- 
overlapping building blocks, many problems have different building blocks that share common 
components. An analysis similar to the one presented in this paper can be performed to 
predict which of the two algorithms excel. However, since the effect of overlapping variable 
interactions is similar to that of exogenous noise (Goldberg, 2002), based on the results of 



this paper crossover is likely to be more useful than the mutation for solving problems with 
overlapping building blocks. 

Hierarchical problems: One of the important class of nearly decomposable problems is hierar- 
chical problems, in which the building-block interactions are present at more than a single 
level. Further investigation is necessary to analyze if BB-wise mutation can help speed-up 
the scalability of selectorecombinative GAs. 



7 Summary & Conclusions 

In this paper, we have introduced a building-block-wise mutation operator which efficiently searches 
among the competing building block (BB) neighborhood. We also compared the computational 
costs BB-wise mutation algorithm with a selectorecombinative genetic algorithm for both deter- 
ministic and stochastic additively separable problems. Our results show that while the BB-wise 
mutation provides significant advantage over crossover for deterministic problems, crossover main- 
tains significant edge over the BB-wise mutation on stochastic problems. The results show that the 
speed-up of using BB-wise mutation on deterministic problems is 0(\^klogm), where k is the BB 
size, and m is the number of BBs. Likewise, the speed-up of using crossover on stochastic problems 
with fixed noise variance is 0(mvk/ log m). 
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